Human and Machine Judgements for Russian Semantic Relatedness
نویسندگان
چکیده
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgements about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgements and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples (wordi,wordj , similarityij). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These benchmarks were used to organise a shared task on Russian semantic relatedness, which attracted 19 teams. We use one of the best approaches identified in this competition to generate the fifth high-coverage resource, the first open distributional thesaurus of Russian. Multiple evaluations of this thesaurus, including a large-scale crowdsourcing study involving native speakers, indicate its high accuracy.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملAutomated Detection of Non-Relevant Posts on the Russian Imageboard "2ch": Importance of the Choice of Word Representations
This study considers the problem of automated detection of non-relevant posts on Web forums and discusses the approach of resolving this problem by approximation it with the task of detection of semantic relatedness between the given post and the opening post of the forum discussion thread. The approximated task could be resolved through learning the supervised classifier with a composed word e...
متن کاملGrieser, Karl, Timothy Baldwin, Fabian Bohnert and Liz Sonenberg (2011) Using Ontological and Document Similarity to Estimate Museum Exhibit Relatedness, ACM Journal of Computing and Cultural Heritage 3(3), pp. 1-20
Exhibits within Cultural Heritage collections such as museums and art galleries are arranged by experts with intimate knowledge of the domain, but there may exist connections between individual exhibits that are not evident in this representation. For example, the visitors to such a space may have their own opinions on how exhibits relate to one another. In this paper, we explore the possibilit...
متن کاملRandom Walk on WordNet to Measure Lexical Semantic Relatedness
The need to determine semantic relatedness or its inverse, semantic distance, between two lexically expressed concepts is a problem that pervades much of natural language processing such as document summarization, information extraction and retrieval, word sense disambiguation and the automatic correction of word errors in text. Standard ways of measuring similarity between two words on a thesa...
متن کامل